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SYSTEM AND METHOD FOR fflGHSPEED AND 
BULK BACKUP " 



The present invention relates to a system and method for high-speed and 
bulk backup, and more particularly to a system and method for high-speed and bulk 
backup, wherein tlie data dispersed into a volume unit is set up and divided into 
numerous units such as blocks to perform multi-processes that a pliiraUty of threads 
are compressed sequentially and transferred to different storage devices, 
consequently, the time required for baclcup as well as the time required for data 
compression can be reduced as several flows are running simultaneously within a 
process, in a backup system for protecting tlie data stored on the storage device to 
store the data within a system from viruses, accidents, etc. 

BACKGROUND ART 

According to the U.S. Institute of Emergency Planning, it was reported that 
tlie average loss for industi'ies due to the data losses caused by computer faults 
already had reached one himdred tliousand dollars per hour as of 1994, and stressed 
that data backup and its recovery would be the most important matter directly 
related to national competitiveness and security, even for government offices 
dealing with national data resources under the slogan of electronic government, as 
well as for business enterprises, regardless of its financial loss. 

Recently, while all the industrial sectors being converted into the Internet 
environment, the amount of corporate data as well as personal data continues on the 
rise in geometric progression, accordingly construction or addition of an advanced 
enterprise computing environment based upon storages, such as data warehouse, 
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enterprise resoiirce planning, customer relationship management, knowledge 
management, etc. is growing on a large scale. 

In terms of the storages being installed in various types of businesses, as 
stated above, it would require the extension for himdreds of megabytes or dozens of 
5 gigabytes in a day, therefore the task of maintaining and protecting bullcy data from 
a natural disaster such as flood, fire, etc. or an unexpected calamity such as terror, 
fault, accident, etc. becomes an essential part of business enterprises for their 
existence with the stream of the times. 

Varying circumstances, leading companies such as Veritas, IBM, CA, 

10 Legato, etc. have developed backup solutions lilce NetBaclcup, Tivoli, BrightStor, 
NetWorker, etc. and provided software that the data stored in backup object disks, 
main storage devices connected with the main system, can be backed up onto 
backup disks, like a tape libraries or disk libraries. There ai'e various types of backup 
solutions, such as direct backup, network backup, SAN backup, server-less backup, 

15 etc. 

The types of baclcup solutions are summarized as follows. As illustrated in 
Fig. 1, direct backup is a backup solution that is configiu*ed to have tape drives 
connected independently with each sei-ver, accordingly it has the advantages of no 
loads on the network, etc. and speedy baclaip, however, it costs much in purchasing 

20 tape drives and its backup software, and also it has difficulty in centralized 

management. As a result, it can be useful only if tlie number of servers for backup is 
limited less than three and the capacity of each server less than 100 gigabytes. 

As illustrated in Fig. 2, network backup is a backup solution that is 
configured to have a backup server by assigning one among many servers connected 

25 on a network and the backup server provides a backup for other servers via the 

network. As a merit, a centralized management can be achieved easily and the cost 
for purchasing backup equipment and software can be low, however, it has a 
problem such as an excessive load on the network, transfeaing high volume data via 
the same network durmg the process of a backup. 
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SAN backup, not shown, is a backup solution that is configured to have 
servers, storages and backup devices connected via a fiber channel requires a lot of 
investment but has the highest backup performance. Besides, server-less backup is a 
backup solution with a good performance using a method of dispersing the function 
5 of a backup server by reducing the rate of CPU usage. 

However, conventional backup solutions stated above still have a problem, 
wherein the more backup files or data they have within a main storage device, the 
lower backup speed tliey get. 

Therefore, it is an important issue to reduce the time required for baclaip and 
10 recovery to the lowest degree. Besides, the compression part for storing a lot of data 
more efficiently within the limited capacity of tape libraries or disk libraries 
whereon tlie baclcup data to be stored is another key issue. 



DISCLOSURE OF THE INVENTION 
1 5 The present invention is provided to solve the problems as stated above, and 

it is an object of the invention to provide a backup and recovery at a higher speed 
during the process of backup and recovery for the system data. 

It is anotlier object of the invention to improve tlie efficiency of a storage 
device using compression, backup and recovery for a lot more data witliin the 
20 limited capacity of storage devices. 

In order to accomplish these objects, a system for high-speed and bulk 
backup includes a backup object disk whereon a backup object data to be stored; a 
backup disk whereon the backup object data to be compressed and stored; and a 
backup means, wherein a volume of backup object data stored in the backup object 
25 disk is divided into a predetermined size of unit data, a plurality of threads rumiing 
several flows within a process are generated and thereby the divided unit data are 
sequentially compressed and stored onto the backup disk. 

Preferably, the system of high-speed and bulk backup further includes an 
mput/output unit, wherein the command including backup operating commands is 
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supplied, and the result from the predetermined command is output; and a central 
processing unit, wherein the backup operating command supphed through the 
input/output unit is processed, thereby a baclcup can be implemented with a backup 
means. 

5 Moreover, the backup means includes a backup master module, wherein a 

backup operating command supplied through tlie input/output imit and central 
processing unit is received and transmitted to a backup manager module; a backup 
manager module, wherein the backup operating command required for operating a 
backup is received from die baclaip master module and the backup reservation 

10 information for each volume is managed, a backup status and backup history 

information for each volume is collected and managed, and the backup command for 
a disk volume according to a backup schedule is transmitted to a backup agent 
module; and a backup agent module, wherein the backup commands are supphed 
from the backup manager module and the volume of data on a backup object disk is 

1 5 divided into a predetermined size of unit data, a plurality of threads running several 
flows within a process are generated, and thereby the divided imit data can be 
sequentially compressed and stored onto the backup disk. 

Preferably, anotlier embodiment of the invention comprised of a backup 
master server, including a backup master module; and a pluraUty of backup manager 

20 servers including a baclcup manager module and a backup agent module, having a 
backup object disk and a backup disk, wherein when a command including backup 
operating commands is received by the backup master server and transmitted to the 
backup manager server, tlie backup reservation information per each volume is 
managed, a backup status and backup history information per each volume is 

25 collected and managed by the backup manager module, and the backup command 
for a disk volume according to a backup schedule is transmitted to a backup agent 
module,. then according to the backup command supplied from the backup manager 
module, a volume of data on the backup object disk is divided into a predetermined 
size of unit data, a plurality of threads running several flows within a process are 
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generated, and the divided unit data are sequentially compressed and stored onto the 
backup disk by the backup agent module. 

Moreover, still another embodiment of the invention comprised of a backup 
master server including a backup master module; a plurality of backup manager 
5 servers including the backup manager module, having backup object disks; and a 
backup agent server including the backup agent module, having baclcup disks, 
wherein when a command including the backup operating commands is received by 
the backup master server and transmitted to the backup manager server, the backup 
reservation information per each volume is managed by the backup manager module 

1 0 witliin the backup manager sei-ver, a volume of data is divided into a predetermined 
size of imit data, read and transmitted to the baclaip agent server, a backup status 
and baclcup history information per each volume is collected and managed according 
to the backup progress at the side of backup agent sei'ver, and tlie baclcup command 
for a disk voliune according to a backup schedule is transmitted to a backup agent 

1 5 server by the baclcup object disk, tlien according to the backup command supplied 
from the backup manager module, a plurality of threads are generated, a 
predetermined size of unit data is received in order, a plurality of threads generated 
are sequentially compressed and stored onto the baclcup disk by the backup agent 
module \vitliin the backup agent server. 

20 Preferably also, during the recovery process of data stored in a baclcup disk, 

the unit data divided and compressed will be restored in reverse order with a tliread 
teclinique, the most suitable size of data will be 'T^lock size (4096 x N) x number of 
blocks (M) = 20-25 Mbytes*' in a predetermined unit size while implementing a 
backup and recovery. 

25 In case the backup object data stored in a backup object disk of die backup 

manager server is more than one hundred thousand, volume backup, where a backup 
is provided by dividing the whole volume of a backup object data into the unit data 
through accessing to a raw device regardless of the type of file, is faster, however, in 
case the backup object data is less than one hundred thousand, file backup, where 
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each file is divided into the unit data, sequentially compressed using a thread 
technique and stored in a backup disk of the backup server, is faster. So, it is 
preferable that either file backup or volume backup can be selectively implemented 
in ttie backup manager server according to the number of files of the backup object 
5 data. 

A method of high-speed and bulk backup according to the invention 
comprises the steps of receiving the compression object disk infonnation and the* 
directory information to be stored; driving a plurality of compression threads; 
dividing and reading block index values supplied from the compression object disk 

10 on a pliu-ality of driven compression threads; reading each data block belong to tlie 
block index read for each compression thread; compressing simultaneously for each 
data block read on a plurality of compression threads; storing the data blocks 
compressed to a storage directory for a plurality of compression threads; judging 
whether there exist more data blocks to be compressed, increasing tlie block index if 

1 5 there exist more data blocks to be compressed, then interruptmg to read tlie data 
block; finishing a plurality of tlireads if there exist no data blocks to be compressed; 
and completing a backup by ensuring tliat compression of all data blocks is 
completed. 

Preferably, the input at the level of driving the compression tlireads is a 
20 ' block index, and the input for die data compression means while the compression 
being in progress is a compression object data block, and the output is a data block 
compressed. 

Preferably also, backup data can be restored in reverse order of the backup 
method aforementioned, and the data to be compressed can be sequentially 
25 implemented by dividing the data on a volume into a unit data, or sequentially 
processed for a pluraUty of files by threads. 



BRIEF DESCRIPTION OF DRAWINGS 
FIG. 1 is a block diagram showing conventional direct backup. 
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FIG. 2 is a block diagram showing conventional network backup. 

FIG. 3 is a block diagram showing a preferred embodiment of backup system 
according to the present invention. 

FIG. 4 is a block diagram showing another preferred embodiment of backup 
5 system according to the present invention. 

FIG. 5 is a block diagram showing still another preferred embodiment of 
backup system according to the present invention. 

• FIG. 6 is an exemplary diagram showing a method dividing a volxmie in 
detail according to tlie present invention. 
10 FIG. 7 is a flowchart showing a method of backup according to the present 

invention. 



BEST MODE FOR CARRYING OUT THE INVENTION 
Hereinafter, the preferred embodiments of the present invention will be 
1 5 described in detail with the accompanying diagrams. 

As illustrated in FIG. 3, a block diagram showing a system of high-speed and 
bulk backup accordmg to the present mvention, the system of high-speed and bulk 
backup 100 has a form of implementation integrated into a computer system, and 
here the elements not directly related to the invention witliin a computer system are 
20 not shown. 

The system of high-speed and bulk backup 100, shown in FIG. 3, it 
comprises the modules such as backup master module 10, backup manager module 
20 and backup agent module 30, as the units perfonning one or more specific 
functions, an input/output unit 50 wherein a coinmand including backup operating 
25 commands is received from outside, and a central processing unit 40 to control a 
backiqp object disk 60 whereon the data for backup is stored, a backup disk 70 
whereon the backup object data stored in a backup object disk is compressed and 
stored, and the modules 10, 20, 30 through the commands supplied through the 
. input/output unit 50. 
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In concrete terms, Hie backup master module 10 as an element performing 
the function to manage an overall backup system, manages backup reservation 
information for each volume and provides backup coriraiands to the backup manager 
modules 20 according to a backup schedule. 
5 Here, backup reservation information means the data such as from which 

disk, to which disk, on which time, for which period, etc. that have been set up by a 
backup manager according to an automatic backup, and therefore the backup master 
module 10 will be operating automatically according to a reserved backup schedule 
in order to proceed a baclcup on the backup manager module 20 and the backup 
1 0 agent module 3 0. 

On the other hand, when there is a plur ality of backup manager modules 20, 
it is preferable for a backup master module 10 to manage a backup by bundling 
multiple backup manager modules 20 in a group. 

The backup manager module 20 receives backup operating commands 
1 5 required for backup management from the baclaip master module 1 0 and transmit 
them to the backup agent module 30, and moreover to collect the backup status and 
history for each volume from tlie backup information being implemented on tide 
backup agent module 30, then transmit them to the baclaip master module 10. 
Also, the backup agent module 30 is configured to receive backup or 
20 recovery connomands from the backup manager module 20 in order to implement a 
backup or recovery according to the commands. Wlien it receives a command for 
implementing a backup on a backup object disk 60, a volume of data withm the 
backup object disk 60 is divided and read into the imit data, the n-threads are 
generated, and the imit data tliat has been read from the backup object disk 60 is 
25 compressed sequentially to be stored to the backup disk 70. 

Besides, the backup agent module 30 implements the functions of collecting 
and managing backup information for each volume while implementing the backup, 
and reporting the status of backup implementation in progress to the backup 
manager module 20. 
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For reference, regarding the thread, that is a kind of module for which 
various jobs are divided into small ones as a separate job unit within a process, a 
program can be internally divided into the unit of threads for implementing 
simultaneously. 

5 In this manner, the system of high-speed and bulk bacloip according to the 

invention can reduce the time required for backup, increase the compression rate 
substantially, and store a lot more data under the same backup disk circumstance, 
using the feature that the data within a backup object disk 60 can be divided and 
read into tlie luiit data, along with the feature tliat the data read can be compressed 

10 simultaneously by a plurality of tlireads to be stored onto a backup disk 70. 

FIG. 4 is a block diagram showing anotlier preferred embodiment of the 
invention, comprising a backup manager server 300 and a backup master server 20 
for sending backup commands to the backup manager server 300, wherein the 
backup manager server 300 includes a backup manager module 20, a backup agent 

15 module 30, a backup object disk 60 and a backup disk 70, and the backup master 
server 200 includes a backup master module 10, compared with the components 
shown in FIG. 3. 

Here, it can be connected via an interface or a network between the backup 
master server 200 and the backup manager server 300, and it can have a tree type 

20 configuiation wherein a plurality of baclcup manager servers 300 are managed by a 
backup master server 200, 

The configuration and its implementation shown in FIG. 4 are not so 
different from the configuration and its implementation shown in FIG, 3. When 
cormected via an open network like the Internet, a plurality of backup manager 

25 servers 300, corresponding to the clients against a backup master server 200 in its 
concept, are managed by a backup master server 200 througli the backup operating 
command received according to a reserved backup information. At the side of 
backup manager server 300, the backup command received at the backup manager 
module 20 will be transmitted to the backup agent module 30, and moreover the 
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backup agent module 30 can be configured that a volume of data from the backup 
object disk 60 is divided and read into a predetermined size of unit data, then a 
plurality of threads are generated and the divided unit data is compressed 
sequentially to be stored into the backup disk 70. 
5 According to the embodiment shown in FIG. 4, in this maimer, it can reduce 

the time required for backup, increase the compression rate substantially, and store 
more data under the same backup disk circumstance, using the feature that the data 
within a baclcup object disk 60 can be divided and read into the imit data, along with 
die feature that the data read can be compressed simultaneously by a plurality of 

10 tlireads to be stored into a baclcup disk 70, and moreover, the clients connected via 
an open network such as the Internet, i.e. temporary backup manager servers 300, 
can be managed and administered in a biuidled group unit. 

FIG. 5 is a block diagram showing still another prefeiTed embodiment of the 
invention. Here, a backup master server 200, a backup manager server 300 and a 

15 backup agent server 400 are configured respectively as a separate server, and tlaese 
individual servers are connected via an interface or a network for implementing a 
backup. Moreover, a plurality of backup manager servers 300 can be connected to a 
backup master server 200, and also each bacloip manager server 300 can be 
connected with each backup agent server 400. 

20 This time, a backup object disk 60 on which the data is stored will be 

configm*ed with each backup manager server 300, however a backup disk 70 on' 
which the compressed data of backup object disk 60 is stored will be configured 
with each backup agent server 400. 

As shown in FIG. 5, a command including backup operating commands is 

25 received at a baclcup master server 200 and transmitted to a backup manager server 
300, then the backup reservation infomation for each volimie can be managed at a 
backup manager module 20 within the backup manager server 300, and a volume of 
data can be divided and read into a predetermined size of unit data on the backup 
object disk, then transmitted to a backup agent server 400. 
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At the side of backup agent server 400, a plurality of threads are generated 
according to the backup command received from the backup manager server 300, 
then the unit data si5)phed from tlie backup manager server 300 can be sequentially 
received and compressed by a plurality of flireads to be stored on a backup disk. 
5 As illustrated in FIG. 6, a volume data witliin a backup object disk 60 can be 

divided into a plurality of unit data by a backup agent module 30 or a backup agent 
server 400. In case the number of threads are four in a voliune, the index will be 
sequentially assigned as 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,..., etc., and the data belong to' the 
corresponding index will be read by each thread for implementing the compression 
10 process. An experimental result shows diat the most suitable size for tlie unit data 
divided would be '*block size (4096 x N) x number of blocks (M) = 20-25 Mbytes" 
for unplementing a backup at high-speed. 

As illustrated in FIG. 7, a flowchart showing a mefliod of high-speed and 
bulk backup, a backup command from a backup manager module 20 or a backup 
1 5 manager server 300 can be supplied to a backup agent module 30 or a backup agent 
server 400 for implementing a backup. 

According to FIG. 7, a backup agent module 30 or a backup agent server 400 
receives information about the compression object disk and the directoiy to be 
stored from a bacloip manager module 20 or a backup manager server 300 (step SI). 
20 Then, a plurality of multiplex compression tlireads will be driven by the 

backup agent module 30 or the backup agent server 400, at this time the input will 
be a block index value (step S2), and fliis value received at the step S2 will be 
divided and read by a plurality of compression threads (step S3). 

Subsequently each data block for the block index will be read from a 
25 compression object disk by the multiplex compression threads (step S4), and then it 
will be compressed while each data block for compression being received (step S5), 

The compressed data blocks produced by the step S5 will be stored at the 
directory of storage (step S6), then judging if there exist any more data blocks to be 
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compressed, when there exist, it will be inteimpted to the step S3 where another data 
block can be read after the step SIO where the block index is increased (step S7). 

When there exist no more data blocks to be compressed according to the 
result of judgment at the step S7, a pluraUty of multiple compression threads will be 
5 JBnished (step S8), then the same backup procedure will be completed by ensuring 
that compression of all data blocks have been completed. 

Here, it is also possible to confirm whether the bulk backup is completed 
correctly or not. As a detailed method, when the procedure of backup and recovery 
has been completed, it will be checked again wheflier the backup has been 

10 completed in the proper way, e.g. the data on a backup object disk will be backed up 
to a backup disk and restored to the backup object disk again, and tlien the 
correctness of restored data will be checked by comparing the data content of the 
backup object disk with that of the baclcup disk, consequently this type of 
verification can be used for a method to secure the stability of baclaip. 

1 5 Though tlie prefeiTed embodiments according to tlie present invention are 

described aforementioned in detail, it will be apparent to those skilled in tlie art that 
various modifications and variations can be made in tlie present invention within the 
scope of the appended claims and then* equivalents. 

20 INDUSTRIAL APPLICABILITY 

According to the present invention, it has an effect that the time required for 
backup and recovery can be reduced substantially as well as the size of data after 
implementing a backup can be reduced drastically, therefore excellent backup 
performance can be secured for users and also tlie TCO (Total Cost for Ownership) 
25 for backup resources can be reduced substantially. 

Besides, it can provide safe protection for users under E-business 
environment requiring an enormous amount of data, and furthermore the 
performance of high-speed and bulk backup as well as the fimction of powerful data 
compression, which had not been available in the existing backup management 
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solutions, can be used effectively for tlie task of high-speed and bulk backup m the 
areas of ASP/ISP, conununications, banking, on-line services, and business 
enterprises. 
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